34 research outputs found
Stable soft extrapolation of entire functions
Soft extrapolation refers to the problem of recovering a function from its
samples, multiplied by a fast-decaying window and perturbed by an additive
noise, over an interval which is potentially larger than the essential support
of the window. A core theoretical question is to provide bounds on the possible
amount of extrapolation, depending on the sample perturbation level and the
function prior. In this paper we consider soft extrapolation of entire
functions of finite order and type (containing the class of bandlimited
functions as a special case), multiplied by a super-exponentially decaying
window (such as a Gaussian). We consider a weighted least-squares polynomial
approximation with judiciously chosen number of terms and a number of samples
which scales linearly with the degree of approximation. It is shown that this
simple procedure provides stable recovery with an extrapolation factor which
scales logarithmically with the perturbation level and is inversely
proportional to the characteristic lengthscale of the function. The pointwise
extrapolation error exhibits a H\"{o}lder-type continuity with an exponent
derived from weighted potential theory, which changes from 1 near the available
samples, to 0 when the extrapolation distance reaches the characteristic
smoothness length scale of the function. The algorithm is asymptotically
minimax, in the sense that there is essentially no better algorithm yielding
meaningfully lower error over the same smoothness class. When viewed in the
dual domain, the above problem corresponds to (stable) simultaneous
de-convolution and super-resolution for objects of small space/time extent. Our
results then show that the amount of achievable super-resolution is inversely
proportional to the object size, and therefore can be significant for small
objects
An analysis of training and generalization errors in shallow and deep networks
This paper is motivated by an open problem around deep networks, namely, the
apparent absence of over-fitting despite large over-parametrization which
allows perfect fitting of the training data. In this paper, we analyze this
phenomenon in the case of regression problems when each unit evaluates a
periodic activation function. We argue that the minimal expected value of the
square loss is inappropriate to measure the generalization error in
approximation of compositional functions in order to take full advantage of the
compositional structure. Instead, we measure the generalization error in the
sense of maximum loss, and sometimes, as a pointwise error. We give estimates
on exactly how many parameters ensure both zero training error as well as a
good generalization error. We prove that a solution of a regularization problem
is guaranteed to yield a good training error as well as a good generalization
error and estimate how much error to expect at which test data.Comment: 21 pages; Accepted for publication in Neural Network
Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review
The paper characterizes classes of functions for which deep learning can be
exponentially better than shallow learning. Deep convolutional networks are a
special case of these conditions, though weight sharing is not the main reason
for their exponential advantage